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On Robustness Properties in Empirical Centroid Fictitious Play 
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Abstract 

Empirical Centroid Fictitious Play (ECFP) is a generalization of the well-known Fictitious Play (FP) algorithm designed for 
implementation in large-scale games. In ECFP, the set of players is subdivided into equivalence classes with players in the same 
class possessing similar properties. Players choose a next-stage action hy tracking and responding to aggregate statistics related 
to each equivalence class. This setup alleviates the difficult task of tracking and responding to the statistical behavior of every 
individual player, as is the case in traditional FP. Aside from ECFP, many useful modifications have been proposed to classical FP, 
e.g., rules allowing for network-based implementation, increased computational efficiency, and stronger forms of learning. Such 
modifications tend to be of great practical value; however, their effectiveness relies heavily on two fundamental properties of FP: 
robustness to alterations in the empirical distribution step size process, and robustness to best-response perturbations. The main 
contribution of the paper is to show that similar robustness properties also hold for the ECFP algorithm. This result serves as a 
first step in enabling practical modifications to ECFP, similar to those already developed for FP. 


I. Introduction 

The field of learning in games is concerned with the study of systems of interacting agents, and in particular, the question 
of how simple behavior rules applied at the level of individual agents can lead to desirable global behavior. Fictitious Play 
(FP) m is one of the best studied game-theoretic learning algorithms. While attractive for its intuitive simplicity and proven 
convergence results, certain practical issues make FP prohibitively difficult to implement in games with a large number of 
players llJ-llll. 

Empirical Centroid FP (ECFP) i), H) is a recently proposed generalization of FP designed for implementation in large 
games. In ECFP, the set of players is subdivided into sets of “equivalence classes” of players sharing similar properties. In this 
formulation, players only track and respond to an aggregate statistic (the empirical centroid) for each class of players, rather 
than tracking and responding to statistical properties of every individual player, as in classical FP. ECFP has been shown to 
learn elements of the set of symmetric Nash equilibria for the class of multi-player games known as potential games. 

The main focus of this paper will be to study ECFP and show that certain desirable properties possessed by classical FP 
also hold for the more general ECFP. In particular, the work studied classical FP and proved that the fundamental learning 
properties of FP can be retained in the following scenarios: 

(i) The step size sequence of the empirical distribution process takes on a form other than {l/f}t>i. 

(ii) Players are permitted to make suboptimal choices when choosing a next-stage action so long as the degree of suboptimality 
decays asymptotically to zero with time. 

We say a FP-type algorithm is step-size robust if it retains its fundamental learning properties in the first scenario, and we 
say an algorithm is best-response robust if it retains its fundamental learning properties in the second scenario. 
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FCT and by FCT Grant CMU-PT/SIA/0026/2009, and was partially supported by NSF grant ECCS-1306128. 
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The notion of step-size robustness generalizes the concept of the empirical distribution of classical FP. A player’s empirical 
distribution in classical FP is taken to be the time-averaged histogram of the player’s action history; implicitly, this has an 
incremental step size of 1/f. Scenario (i) allows players to choose alternate step-size sequences. Of particular interest is that 
it allows for construction of an empirical distribution that places more emphasis on recent observations while discounting 
observations from the distant past. 

The notion of best-response robustness generalizes FP by relaxing the traditional assumption that players are always perfect 
optimizers. In particular, in classical FP, it is assumed that players are capable of choosing their next-stage action as a (precise) 
best response to the empirical action history of opposing players. In practice, this is a stringent assumption, requiring that 
players have perfect knowledge of the empirical distribution of all opposing players at all times, and are capable of precisely 
solving a (non-trivial) optimization problem each iteration of the algorithm. By relaxing this implicit assumption slightly (as 
in scenario (ii)), one is able to consider many useful extensions of FP of both practical and theoretical value. 

In lib], the best-response robustness of FP was used to show convergence to the set of Nash equilibria of stochastic FP 
with vanishing smoothing, and to prove convergence of an FP-inspired actor-critic learning algorithm. In 13, best-response 
robustness of FP was used to show convergence of sampled FP—a variant of FP in which computational complexity is 
mitigated by approximating the expected utility using a Monte-Carlo method—and used again in Q to ensure convergence of 
an even more computationally efficient version of sampled FP. In 13, the best-response robustness of FP is used to construct a 
variant of FP achieving a strong form of learning in which the player’s period-by-period strategies are guaranteed to converge 
to equilibrium (rather than only convergence in terms of the empirical frequencies, as is typical in FP). The best-response 
robustness of FP is also useful in that it allows for practical network-based implementations of FP; e.g., 0. 

The main contribution of this paper is to demonstrate that ECFP is both step-size robust and best-response robust; i.e., ECFP 
retains its fundamental learning properties under scenarios (i) and (ii) above. This result is a necessary first step in order to 
develop practical modifications for ECEP similar in spirit to those already developed for EP; e.g., improved computational 
efficiency, network-based implementation rules, and strongly convergent variants of the algorithm, as mentioned aboveQ We 
prove the result following a similar line of reasoning to 0, 0; we first study a continuous-time version of ECEP, and then 
use results from the theory of stochastic approximations to prove our main result regarding convergence of discrete-time ECEP 
based on properties of the continuous-time counterpart. 

The remainder of the paper is organized as follows. Section |II] sets up the notation to be used in the subsequent development 
and reviews the classical EP algorithm. Section |III] presents discrete-time ECEP and states the main result. Section HVl reviews 
relevant results in differential inclusions and stochastic approximations to be used in the proof of the main result. Section 
IV] presents continuous-time ECEP Section |VT] proves convergence of discrete-time ECEP using properties of continuous-time 
ECEP Section IVTIl provides concluding remarks. 


II. Preliminaries 


A. Game Theoretic Preliminaries 

A review of game-theoretic learning algorithms—including classical EP—can be found in cni, M- 

^The results of this paper are directly applied in to prove a strong learning result for a variant of ECFP. We also note that one possible network- 
based implementation of ECFP has been presented in (5). This implementation—which considers a fixed communication graph topologies and synchronous 
communication rules—relies on a weak form of best-response robustness (see 0, A.3). In order to consider ECFP in more general distributed scenarios (e.g., 
random communication graph topology and asynchronous communication rules) it is necessary to have the full robustness property derived in this paper. 
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A normal form game is given by the triple F = {N, {Yi)i^N^ (wi(-))jetv), where N = {1,... ,n} represents the set of 
players, Yi —a finite set of cardinality rm —denotes the action space of player i and Ui{-) : 0"=! —5" R represents the utility 

function of player i. 

Throughout this paper we assume: 

A. 1. All players use identical utility functions. 

Under this assumption we drop the subscript i and denote by u{-) the utility function used by all players. The set of mixed 

strategies for player i is given by = {p G > 0 Vfc = 1,... ,mi}, the mi-simplex. A mixed 

strategy pi G Ai may be thought of as a probability distribution from which player i samples to choose an action. The set of 
joint mixed strategies is given by A" = nr=i ^ joint mixed strategy is represented by the n-tuple (pi,... ,Pn), where 
Pi G Ai represents the marginal strategy of player i, and it is implicity assumed that players’ strategies are independent. 

The mixed utility function is given by [/(•) : A" —>• R, where, 

U{pi,...,Pn) := ^ Uz(t/)pi(pi) . . .p„(pn). 
veY 

Note that [/(•) may be interpreted as the expected value of u{y) given that the players’ mixed strategies are statistically 
independent. For convenience, the notation t/(p) will often be written as C7(pi,p_i), where pi G A^ is the mixed strategy for 
player i, and p_i indicates the joint mixed strategy for all players other than i. 

For e > 0, i € N and p_i G A_i, define the e-best response set for player i as 

BRl{p^i) := [pi G Ai : U{pi,p-i) > max U(ai,p-i) - e} 

cti^Ai 

and for p G A define 

BR%p) := (BRlip.,),BRUp-n)). 


The set of Nash equilibria is given by 

NE := {p G AV C/(p„p_,) > t/(p',p_,),Vp' G A„ Vz}. 

As a matter of convention, all equalities and inequalities involving random objects are to be interpreted almost surely (a.s.) 
with respect to the underlying probability measure, unless otherwise stated. 

B. Repeated Play 

The learning algorithms considered in this paper assume the following format of repeated play. 

Let a normal form game F be fixed. Let plaws repeatedly face off in the game F, and for t G {1,2,...}, let ai{f) G A^ 
denote the action played by player i in round <o Let the n-tuple a{t) = (ai(f),..., a„(f)) denote the joint action at time t. 

Denote by qi(t) G A^, the empirical of player i. The precise manner in which the empirical distribution is 

formed will depend on the algorithm at hand. In general, qi{t) is formed as a function of the action history {az(s)}s=i and 

^An action is usually assumed to be pure strategy, or a vertex of the simplex A^. In this work, an action is permitted to be an arbitrary mixed strategy (cf. 
(6). for the case of FP). Since the results hold for any actions of this form, they also hold for the typical case where actions are restricted to be pure strategies. 

^The term empirical distribution is often used to refer explicitly to the time-averaged histogram of the action choices of some player i.e., qi{t) = 
j CLi{s). However, using a broader definition as considered here, allows for interesting algorithmic generalizations; e.g., learning processes that place 

greater emphasis on observations of more recent actions. See 0 for further discussion. 
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serves as a compact representation of the action history of player i up to and including the round t. The joint empirical 
distribution is given by q{f) := {qi{t),..., qn{t))- 

C. Classical Fictitious Play 

FP may be intuitively described as follows. Players repeatedly face off in a stage game F. In any given stage of the game, 
players choose a next-stage action by assuming (perhaps incorrectly) that opponents are using stationary and independent 
strategies. In particular, let the empirical distribution be given by the time-averaged histogram 

1 , 

Q^{t) ■= (1) 

^ S=1 

in FP, players use the empirical distribution of each opponent’s past play as a prediction of the opponent’s behavior in the 
upcoming round and choose a next-round strategy that is optimal (i.e., a best response) given this prediction. 

A sequence of actions {a(f)}t>i such thaj^ 

ai{t -I- 1) G BRi{q_i{t)), Vi, 

for all f > 1, is referred to as fictitious play process. It has been shown that FP achieves Nash equilibrium learning in the 
sense that d{q{t),NE) —>■ 0 as f —)■ cx) for select classes of games including two-player zero-sum games ifT^ . two-player 
two-move games lfT3l . and multi-player potential games ([141, IfTSl . 

D. Empirical Centroid FP Setup 

A presentation of ECFP in it’s most elementary form (i.e., all players are grouped into a single equivalence class) is given 
in 0; the elementary formulation is less notationally involved, and can serve as a useful means of conveying the basic ideas 
of the approach in a straightforward manner. In this paper we focus on the general formulation of the ECFP algorithm. 

In ECFP, players are grouped into sets of equivalence classes, or “permutation invariant” classes. Such grouping allows 
players to analyze collective behavior by tracking only the statistics of each equivalence class, rather than tracking the statistics 
of every individual player. 

Let m < n, denote the number of classes, let I = {1,..., m} be an index set, and let C = {Ci,..., Cm} be a collection 
of subsets of iV; i.e. Ck C N, Vfc G /. A collection C is said to be a permutation-invariant partition of N if, 

(0 Ck n Cf = 0, for k,e€l,k^i, 

{li) \JCk = N, 

k&I 

{Hi) for k G I, i,j C Ck, Yi = Yj, 

{iv) for k G I, i,j G Ck, there holds for any strategy profile y = {yi,yj,y-{ij)) G Y, 

u{yi,yj,y-{i,j)) = u{[yj]i, [yi]j,y-(ij)), 

where the notation {[yi]j, \yj\i, y-(ij)) indicates a permutation of (only) the strategies of players i and j in the strategy profile 

y = {y^,yj,y-{^,j))■ 

For a collection C, define (j){-) : N ^ I to he the unique mapping such that (j){i) = k if and only if z G Ck- 


■^In all learning algorithms discussed in this paper, the initial action ai(l) may be chosen arbitrarily for all i. 
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For k € I, and p G A", and permutation-invariant partition C, define 

/ := ICfel-i K (2) 

i^Ck 

to be the k-th centroid with respect to C, where \Ck\ denotes the cardinality of the set Ck- Likewise for p G A" define 

P-={Pl,p2,---,Pn), (3) 

where pi := p'^^’'\ to be the centroid distribution with respect to C. 

Given a permutation-invariant partition C, let the set of symmetric Nash equilibria (relative to C) be given by, 

SNE := {p G NE : pi = pj Vi,jGCk, V fc € /}, 

and let the set of mean-centric equilibria (relative to C) be given by, 

MCE := {p € A” : U{pi,p-i) > 17(p',p_i), Vp- e A*, Vf}. 

The set of MCE is neither a strict superset nor subset of the NE—^rather, it is a set of natural equilibrium points tailored to 
the ECEP dynamics IfT^ . The set of SNE however, is contained in the set of MCE. 

The sets of SNE and MCE relative to a partition C can be shown to be non-empty under A[T] using fixed point arguments 
similar to na, El. 


III. Empirical Centroid Eictitious Play 

Let the game L be played repeatedly as in Section Ill-BI Let the empirical distribution for player i be formed recursively 
with qi{l) = ai(l) and for f > 1 , 

qi{t + 1) = q^{t) -b 7 t + 1) - q^{t)), (4) 


where we assume: 

A. 2. The sequence { 7 t}t>i in © satisfies 74 > 0, Vf, J2t>i 7* = limt^oo 7 * = 0. 

Let the joint empirical distribution be given by q{t) := {qi{t),..., qn{t)). 

Typical EP-type learning algorithm^ consider the empirical distribution to be a time-averaged histogram that places equal 
weight on all rounds; this corresponds to a step size of form 74 = (e.g., ([T]i). If a EP-type algorithm retains its 

fundamental learning properties under the more general assumption All] then we say the algorithm is step-size robust. 

In ECEP la, players do not track the empirical distribution of each individual player. Instead, they track only the centroid 
q^{t) for each k G I (see dU). Intuitively speaking, in ECEP each player i assumes (perhaps incorrectly) that for each class 
Ck G C the centroid q^{t) accurately represents the mixed strategy for all players j G Ck- Each player i chooses her next-stage 
action as a myopic best response given this assumption. 

Eormally, the joint action at time (f -b 1) is chosen according to the rulj® 


a{t + l) G BR'^^{q{t)), (5) 

^We use the term FP-type leai'ning algorithm to refer to an algorithm in which players choose their next-stage action as a myopic best response to some 
forecast rule based on the current time-averaged empirical distribution of play; cf. the learning framework considered in fM 
^The action a(l) may be chosen ai'bitrarily. 
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where q{t) is the centroid distribution associated with q{t) (see Q), and where it is assumed that, 

A. 3. The sequence {et}t>i in Q satisfies limt_).oo et = 0. 

Typical FP-type learning algorithms assume that players are always perfect optimizers; i.e., et = 0, Vf. If a FP-type learning 
algorithm retains its fundamental learning properties under A[3] we say the algorithm is best-response robust. The work ||6l 
first considered generalizations to FP of the forms indicated in A|2]-A13 and showed that classical FP is both step-size robust 
and best-response robust. In this work we show that ECFP is also robust in both these senses. 

Combining (IHi with (|5]l, gives the following difference inclusion governing the behavior of {q{t)}t>i, 

q{t-\-l) € {1- 7 t) q{t) -f (g(f)). (6) 

Likewise, Lemma|3(see appendix) shows that the sequence of centroid distributions {q{t)}t>i follows the difference inclusion, 

-f 1) G (1 - 7t) q{t) -f 7 tSi?"* (g(f)). (7) 

We refer to the sequence {q{t),q{t)}t>i as a discrete-time ECFP (DT-ECFP) process with respect to (r,C). 

The following theorem is the main result of the paper—it states that, if F is an identical interests game, then under the 
relatively weak assumptions Al2]-Aj3 players engaged in ECFP asymptotically learn elements of sets of SNE and MCE. 
Learning of MCE occurs in the sense that d{q{t), MCE) 0—this form of learning corresponds to the typical notion of 

setwise convergence in empirical distribution typical in classical EP (see Section ITl-CI and lfT9l . ifTTIl '). Learning of SNE occurs 
in the sense that d{q{t),SNE) —^ 0. This notion of learning, while similar in spirit to the typical notion of convergence in 

empirical distribution, differs in that it is the empirical centroid distribution Q that is converging to the set of SNE, rather 

than the empirical distribution itself. 

Theorem 1. Assume AU]-A\^hold. Let C be a permutation-invariant partition of the player set N. Let {q(t), q{t)}t>i be an 
ECFP process with respect to (r,C). Then, 

(i) players learn a subset of the MCE in the sense that Yvmt^oo d{q{t), MCE) = 0, 

(ii) players learn a subset of the SNE in the sense that limt_j.oo d{q{t), SNE) = 0. 

We note that if e* = 0 and yt = then convergence of ECEP in the sense of Theorem [T] was established in our prior 

work Q. 

In order to prove Theorem [T] in its full generality we follow the approach of 0,1191 —we first study the set of continuous-time 
differential inclusions associated with ECEP, and then derive Theorem [T] from the continuous-time results via tools from the 
theory of stochastic approximations. 

In particular. Section |IV] discusses the notion of a perturbed solution of a differential inclusion, introduces the notion of a 
chain transitive set, and presents key results that allow one to relate the limit sets of perturbed solutions to internally chain 
transitive sets of the associated differential inclusion. Section E then presents continuous-time ECEP (CT-ECEP) and shows 
convergence of CT-ECEP to the sets of SNE and MCE using Lyapunov arguments. 

Section |VT] presents Lemmas [T] and |2] that relate the limit sets of DT-ECEP to the limit sets of CT-ECEP Lemma [T] shows 
that the limit sets of DT-ECEP are contained in the internally chain transitive sets of the corresponding CT-ECEP process. 
This is accomplished by first showing that DT-ECEP processes may be considered to be perturbed solutions of the associated 
CT-ECEP differential inclusion, and then invoking Theorem |2] to clinch the result. Lemma |2] then shows that the internally 
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chain transitive sets of CT-ECFP are contained in the sets of MCE and SNE. This is accomplished by invoking Proposition [T] 
together with the Lyapunov arguments derived for CT-ECFP processes in Section [V] 


The proof of Theorem [T] then follows by combining Lemmas [T] and |2] as noted in Section IVI-AI 


IV. Chain Transient Sets 


We study the limiting behavior of DT-ECFP by first studying the behavior of a continuous-time version of ECFP, and then 
relating the limit sets of DT-ECFP to the limit sets of its continuous-time counterpart. In particular, we will relate the limit sets 
of DT-ECFP to the chain transitive sets of CT-ECFP. Following the approach of El, 0, let F denote a set-valued function 
mapping each point ^ G R™ to a set F(^) G R™. We assume; 

A. 4. (i) F is a closed set-valued 

(ii) F{^) is a nonempty compact convex subset o/R™ for all ^ G R™. 

(Hi) For some norm || • || on M™, there exists c > 0 such that for all ^ G R™, sup^g^(^) ||? 7 || < c(l -f ||^||). 

Definition 1. A solution for the differential inclusion ^ G F{x) with initial point ^ G R™ is an absolutely continuous mapping 


X : M —>■ R*” such that x(0) = ^ and G F{x{t)) for almost every f C R. 


Definition 2. Let || • || be a norm on R™, and let F : R™ R'” be a set valued function satisfying Consider the 
differential inclusion 



(8) 


(a) Given a set X C R™ and points ^ and rj, we write ^ ^ rj if for every e > 0 and T > 0 there exist an integer n > 1, 
solutions Xi, ... ,Xn to the differential inclusion and real numbers ti, ... ,tn greater than T such that 

(i) Xi{s) G X, for all 0 < s < ti and for all i = I,... ,n, 

(ii) \\xi{ti) - Xi+i(0)|| < efor all i = 1,... ,n - 1, 

(Hi) ||xi(0) - ^11 < e and ||x„(f„) - pH < e. 

(b) X is said to be internally chain transient if X is compact and ^ ^ ^ for all ^ G X. 

The following theorem from ||9l allows one to relate the set of limit points of certain discrete-time processes to the internally 
chain transient sets of their continuous-time counterparts. 

Theorem 2. Assume F : R"* —>■ R"* is a set valued function satisfying Aj4] Let {x(f)}t>i be a process satisfying 


x{t + 1) - x{t) — at+iM{n + 1) S at+iF{x{t)), 


(9) 


where is a sequence of non-negative numbers such that 



at = oo and lim at = 0, 


t>i 


^I.e., Graph(F) := {(5, rj) : r) £ F{^)} is a closed subset of R™' X R™'. 





and {M(<)}t>i is a sequence of deterministic or random perturbations. If 


(a) for all T > 0, 


/ k-i k-i \ 

lim sup II oc^+iM{i + 1)11 : V a*+i < T = 0, 

t^OO h. \ I 


(6) sup \\x{t)\\ < oo, 

t>l 

then the set of limit points of {x{t)}t>i is a connected internally chain transitive set of the differential inclusion 

^x{t) e F{x{t)). 


In an abuse of terminologyjj we sometimes refer to a discrete-time process {x{t)}t>i verifying the recursion (|9]l as a 
perturbed solution of ®. 

The differential inclusion ® induces a set-valued dynamical system {‘htjteR dehned by 


<i>t(xo) := {x{t) : X is a solution to ([8]l with x(0 ) = xq}. 


Let A be any subset of K™. A continuous function V : M*” —5> R is called a Lyapunov function for A if V{y) < V{xo) for 
all xo £ R™\A, y G $t(xo), f > 0, and V{y) < l^(xo) for all xq G A, y G $t(xo) and f > 0. The following proposition ( 
0 , Proposition 3.27) allows one to relate the chain transitive sets of a differential inclusion to Lyapunov attracting sets. 

Proposition 1. Suppose that V is a Lyapunov function for A. Assume that 1^(A), the image of A under V, has empty interior. 
Then every internally chain transitive set L is contained in A and V\L, the restriction of V to the set L, is constant. 


V. CONTINUOUS-TIME ECFP 


In this section we consider a continuous-time version of ECFP. Let E satisfy All] and let C be a permutation-invariant 


partition of N. In analogy tj^ (Hjl, for f > 0 let 


G BR{q^{t)) - q^t), 


( 10 ) 


where we use the superscript q‘^{t) to indicate a continuous-time analog of the empirical distribution, and where, for p G A", 
we let BR{p) := BR’^{p) with e = 0, and q‘^{t) is the centroid distribution associated with q^{t) (see Q). We refer to the 
process y°(f)}t>o as a continuous-time ECFP (CT-ECFP) process relative to (r,C). 

As our end goal involves studying the limiting behavior of note that for k € I, and dehned similar to 

perturbed solution of © typically refers to a continuous time process that is associated with © by means of an integrable perturbation. Under appropriate 
conditions, processes of the form ID may be transformed via an interpolation procedure into a continuous-time process satisfying the typical definition of a 
perturbed solution. See (^, Section I for more details. 

^Note that © may be written as q{t -b 1) — q{t) G {Qf)) — 
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there holds 


r’\t) =-n\-^ 

j&Ck 

= lai- E iim 

jeCk 

= E 

j&Ck 


Let p{t) = q^{t) + q‘^(t), so that p{t) = {pi{t),... ,pn{t)) with pi{t) = p‘^^'^\t) foriGN (see Q). By the above, and the 
linearity of differentiation, p{t) = q‘^{t) + q‘^{t). Thus, by Lemma |4] (fTol i implies that p{t) G BR{q‘^{t)), or equivalently. 


g‘=(f) G BR{q^[t)) - q%t). 


( 11 ) 


A. Convergence in Continuous Time 

This section studies the convergence of continuous-time ECFP to the sets of SNE and MCE. 

Eor any solution q^{t) of (fTOl i and associated centroid process q^(f), let w{t) := U{q'^{t)) and let v{t) := i ^(??(^)) 9-i(^))- 

There holds, 

i—1 


n 

> E w+ m), CM) - u{if{t)) 


2=1 

n r 


= E 


max UM^qlM)-Uiq^'it)) 

OiiGAi 


> 0 , 


( 12 ) 


where the second line follows from the concavity of U in pi, and the third follows from (fTTT i. 
By Lemma |3] there holds 

1 ” 

-J2U{q^{t),if_M) = U{if{t))- 

n < ^ 


(13) 


i=l 


Hence u(f) = u>(f), there holds ■v{t) > 0. Moreover, the following expansion is useful in order to study as a Lyapunov 
function for the set of MCE: 


max Uiai,qli{t)) - C((f (t)) 


v{t) = wit) > 

i=l 
n 

= max UiaiM^_i{t)) — nUiq^it)) 


= E max Uioii.qLiit))-Y^Viqiit),q_iit)) 

•i=l i=l 


= E 


max Uiai,ciLi{t)) - Uiqi{t),qli{t)) 

CKj G A.j 


> 0 , 


(14) 


where the inequality follows from (fT2l) . and the third line follows again from ( fOl ). 

By (1121) . wit) is weakly increasing, and is constant in a time interval T if and only if max^-gAi Uiai,qM))-Uiq%t)) = 
0, Vi; i.e., if and only if q‘^it) G SNE for all t G T. 

By (O, vit) is weakly increasing, and i;(t) = 0 in some interval T implies maxa^sAi Uiai.q'Liit)) — {/(gf (i), g!ij(i)) = 
0, yi G N,t & T; i.e., G MCE for all t GT. Moreover, by Lemma|4] q'^it) G MCE, \/t GT => G SNE^t G T, 
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which by the above comments implies w{t) =0 in T, or equivalently v{t) =0 in T. Thus, v{t) is constant in a time interval 
T if and only if q^{t) G MCE for all t GT. 

Proposition 2. Assume AjT] holds. Then, 

(i) The limit set of every solution of (dUi is a connected subset of SNE along which U is constant; 

(ii) For p G A”, let V{p) := The limit set of every solution of (llOl i is a connected subset of MCE along 

which V is constant. 


The proof of this proposition follows from the above comments. 

VI. Limit Sets of Discrete-Time ECFP 

In this section we study the limit sets of DT-ECFP by relating them to the internally chain transitive sets of CT-ECFP. 

The following lemma relates the limit sets of DT-ECFP to the internally chain transient sets of the CT-ECFP differential 
inclusions (fTOl) and (fTTli . 

Lemma 1. Assume A\I\-A\^hold. Let q{t)}t>i be a discrete-time ECFP process. Then, 

(i) The set of limit points of {q{t)}t>i is a connected internally chain transient set of (II lb . 

(ii) The set of limit points of {q(t)}t>i is a connected internally chain transient set of ( 1101 ). 

Proof: Proof of (ii): Observe that adding and subtracting the set BR{q{t)) and rearranging terms in (|6]) gives, 

q{t -f 1) - q{t) -f 7 t {BR{q{t)) - BR^* {q{t))} 

Gyt{BR{q{t))-q{t)}. (15) 

Thus, the process {<7(f)}t>i fits the template of Theorem |2] with x(t) := q{t), F{x) := BR{x) — x, and M(t) := BR{x) — 
BR'^*{x). It is straightforward to verify that F satisfies A|4l 

It suffices to show that the process ( fTSl l satisfies conditions (a) and (b) of Theorem |2] Condition (b) is trivially satisfied 
since q{t) G A" for all t. 

If k is such that T* — then 

fc-i 

k 

i—n 

< T sup \\BR^'{q{t)) - BR{q{t))\\. 

k 

Since BR is upper semicontinuous, BR^{p) —)> BR{p) uniformly as e —)• 0. Thus condition (a) holds, and (ii) is proved. 
Proof of (i): Adding and subtracting BR[q[t)) and rearranging terms, the difference inclusion (|7]) may be written as 

q{t + 1) - q{t) + 7 i {BR{q{t)) - BR^* (g(f))} 

Gyt{BR{q{t)) - q{t)} . (16) 

Thus, the process {g(f)} fits the template of Theorem |2] with x{t) := q{t), M{t) := BR{q{f)) — BR'^*{q(t)), and F{x) := 
BR{x) — X. It was shown in ||9l that F satisfies AH] It is sufficient to show that the process ( fThl l satisfies conditions (a) and 
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(b) of Theorem |2] Condition (b) is trivially satisfied since q{t) € A" for all t. Since M(t) is defined the same manner as in 
case (i), the proof that condition (a) is satisfied follows the same reasoning as in the proof of assertion (ii). ■ 

The following lemma relates internally chain transitive sets of (fTOl i to the set of MCE, and the internally chain transitive 
sets of (fTTI) to the set of SNE; combined with Lemma [T] this will prove Theorem [T] 

Lemma 2. Let T be an identical interests game. Let C be a permutation-invariant partition on T. Then, 

(i) Every internally chain transitive set of (dUi is contained in the set of SNE. 

(ii) Every internally chain transitive set of (1101) is contained in the set of MCE. 

Proof: Proof of (i): Let W := —U. By Section [V-AI (in particular, see (O), TE is a Lyapunov function for the set of 
SNE with x{f) := q{t). Note that W is multilinear and hence continuously differentiable. 

Lor a differentiable function / : K™ K, we say x S M"* is a critical point of / if for i = 1,..., m, the partial derivative 
at X is zero, i.e, g|-/(x) = 0. By Sard’s Theorem ( ll20l . p. 69), if CP is the critical points set of W, then W(CP) contains 
no intervals. By definition, the set of NE is contained in the critical points set of U, and hence also contained in the critical 
points of W. Lurthermore, by definition, SNE C NE, and hence the set SNE is contained in the critical points set of W. 
Thus, by Proposition [T] every internally chain transitive set of (fTTI) is contained in the set SNE. 

Proof of (ii): Note that, by Lemma|4] p S BR(J>) p £ BR{p). Thus, p £ MCE implies that p £ SNE. Let V : A" R, 
with V(p) := i X]r=i U{pi,p-i), and note that by Lemma[3 V{p) = U{p). Invoking again Sard’s Theorem, U{NE) contains 
no intervals, and hence U{SNE) C U{NE) contains no intervals. Since U{SNE) contains no intervals, V{MCE) also 
contains no intervals. 

By Section fVl (in particular, see (fTTI )) the function L is a Lyapunov function for the set of MCE with x{t) := q{t). It follows 
from Proposition [T] that every chain transitive set of (fTOl) is contained in MCE. ■ 


A. Proof of Theorem 1 

Theorem [T] follows directly from Lemmas [T] and |2l 

VII. Conclusions 

Classical Lictitious Play (LP) is robust to alterations in the empirical distribution step-size process and robust to best- 
response perturbations. These robustness properties allow for interesting modifications to LP which can be of great practical 
value. Empirical Centroid Lictitious Play (ECLP) is a generalization of LP designed for large games. The paper showed that 
ECLP is also robust to step-size alterations and best-response perturbations. This result enables future research to consider 
practical modifications to ECLP, similar to those already developed for LP 


Appendix 

Lemma 3. Let C be a partition of N, and for p £ A" let p be as defined in (O. Then ^ U{pi,p-i) = U{p). 


Proof: Let I be an index set for C and let m be the cardinality of I. Lor k € I, and j £ Ck note that 


\Ck\U{p) 


\Ck\U{pj,p-j) 


\Ck\U{ 


i&Ck 


i^Ck i^Ck 


^p-j) 


3 
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where the second line follow from the definition of pj (see Q), the third by multilinearity of U, and the fourth by permutation 
invariance of elements in Ck- Thus, 

n 

n Tt 


2=1 


fee/ ieCfc 


1 1 ^ 

= -Y,\C,\U{p) = -Y,U{p) = U{p). 


k£l 


Lemma 4. Let q G let q be as defined in (O, and let e>0. If p € BR'^{q), then p S BR'^{q). 
Proof: Let i G N. Recall that p := {pi,... ,p„) with pi = p’^^’‘\ There holds 


U{pi,q-i) = U{ 


\C^{i)\ ^ Pi 
= ic'0wr^ ui[P3]i^9-^{t)) 

= 1^0(01”^ E 

= 1^0(01”^ E 

ieC^(i) 

> E (C/(aj,g_j)-e) 

= 1^0(01”^ E max ([/(ai,g_i) - e) 

p'eAi 

= max U{ai, q-i) - e, 

where the first line follows by the definition of pi (see (12l), the second from the multilinearity of U, the fourth by permutation 
invariance of elements of C^(ip the sixth by the fact that, by hypothesis, pj G BRj{q-j), and the seventh by permutation 
invariance of elements of C^(i). Since this holds for all i G A^, it follows that p G BR^{q). ■ 

Lemma 5. Assume AU] holds and suppose the action sequence {a(f)}t>i is chosen according to ©• Then centroid process 
{q{f)^i>I follows the differential inclusion (|7]l. 

Proof: Note that q{t + 1) may be written recursively as 

qft + 1) = qif) + {d{t + 1) - q{t)) . 

By LemmalU a{t+ 1) G BR'^*{q{t)) implies d{t + 1) G BR'^*{q{t)). Substituting this into the above recursion and rearranging 
terms shows that {q{t)} follows the difference inclusion (|7]i. ■ 
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